86 research outputs found

    How Data Drive Early Word Learning: A Cross-Linguistic Waiting Time Analysis

    Get PDF
    The extent to which word learning is delayed by maturation as opposed to accumulating data is a longstanding question in language acquisition. Further, the precise way in which data influence learning on a large scale is unknown—experimental results reveal that children can rapidly learn words from single instances as well as by aggregating ambiguous information across multiple situations. We analyze Wordbank, a large cross-linguistic dataset of word acquisition norms, using a statistical waiting time model to quantify the role of data in early language learning, building off Hidaka (2013). We find that the model both fits and accurately predicts the shape of children’s growth curves. Further analyses of model parameters suggest a primarily data-driven account of early word learning. The parameters of the model directly characterize both the amount of data required and the rate at which informative data occurs. With high statistical certainty, words require on the order of ∼ 10 learning instances, which occur on average once every two months. Our method is extremely simple, statistically principled, and broadly applicable to modeling data-driven learning effects in development

    Logical word learning: The case of kinship

    Get PDF
    We examine the conceptual development of kinship through the lens of program induction. We present a computational model for the acquisition of kinship term concepts, resulting in the first computational model of kinship learning that is closely tied to developmental phenomena. We demonstrate that our model can learn several kinship systems of varying complexity using cross-linguistic data from English, Pukapuka, Turkish, and Yanomamö. More importantly, the behavioral patterns observed in children learning kinship terms, under-extension and over-generalization, fall out naturally from our learning model. We then conducted interviews to simulate realistic learning environments and demonstrate that the characteristic-to-defining shift is a consequence of our learning model in naturalistic contexts containing abstract and concrete features. We use model simulations to understand the influence of logical simplicity and children’s learning environment on the order of acquisition of kinship terms, providing novel predictions for the learning trajectories of these words. We conclude with a discussion of how this model framework generalizes beyond kinship terms, as well as a discussion of its limitations

    The Goldilocks Effect: Human Infants Allocate Attention to Visual Sequences That Are Neither Too Simple Nor Too Complex

    Get PDF
    Human infants, like immature members of any species, must be highly selective in sampling information from their environment to learn efficiently. Failure to be selective would waste precious computational resources on material that is already known (too simple) or unknowable (too complex). In two experiments with 7- and 8-month-olds, we measure infants’ visual attention to sequences of events varying in complexity, as determined by an ideal learner model. Infants’ probability of looking away was greatest on stimulus items whose complexity (negative log probability) according to the model was either very low or very high. These results suggest a principle of infant attention that may have broad applicability: infants implicitly seek to maintain intermediate rates of information absorption and avoid wasting cognitive resources on overly simple or overly complex events

    Modeling infant object perception as program induction

    Full text link
    Infants expect physical objects to be rigid and persist through space and time and in spite of occlusion. Developmentists frequently attribute these expectations to a "core system" for object recognition. However, it is unclear if this move is necessary. If object representations emerge reliably from general inductive learning mechanisms exposed to small amounts of environment data, it could be that infants simply induce these assumptions very early. Here, we demonstrate that a domain general learning system, previously used to model concept learning and language learning, can also induce models of these distinctive "core" properties of objects after exposure to a small number of examples. Across eight micro-worlds inspired by experiments from the developmental literature, our model generates concepts that capture core object properties, including rigidity and object persistence. Our findings suggest infant object perception may rely on a general cognitive process that creates models to maximize the likelihood of observationsComment: 3 pages, 3 figures, accepted at CCN conference 202

    The communicative function of ambiguity in language

    Get PDF
    We present a general information-theoretic argument that all efficient communication systems will be ambiguous, assuming that context is informative about meaning. We also argue that ambiguity allows for greater ease of processing by permitting efficient linguistic units to be re-used. We test predictions of this theory in English, German, and Dutch. Our results and theoretical analysis suggest that ambiguity is a functional property of language that allows for greater communicative efficiency. This provides theoretical and empirical arguments against recent suggestions that core features of linguistic systems are not designed for communication.National Science Foundation (U.S.) (Grant 0844472

    A Corpus Investigation of Syntactic Embedding in Piraha

    Get PDF
    The Pirahã language has been at the center of recent debates in linguistics, in large part because it is claimed not to exhibit recursion, a purported universal of human language. Here, we present an analysis of a novel corpus of natural Pirahã speech that was originally collected by Dan Everett and Steve Sheldon. We make the corpus freely available for further research. In the corpus, Pirahã sentences have been shallowly parsed and given morpheme-aligned English translations. We use the corpus to investigate the formal complexity of Pirahã syntax by searching for evidence of syntactic embedding. In particular, we search for sentences which could be analyzed as containing center-embedding, sentential complements, adverbials, complementizers, embedded possessors, conjunction or disjunction. We do not find unambiguous evidence for recursive embedding of sentences or noun phrases in the corpus. We find that the corpus is plausibly consistent with an analysis of Pirahã as a regular language, although this is not the only plausible analysis
    • …
    corecore